Skip to content

feat: add focused LLM slice files (llms-api.txt, llms-node-ops.txt)#802

Open
crtahlin wants to merge 4 commits into
ethersphere:masterfrom
crtahlin:feat/llm-slice-files
Open

feat: add focused LLM slice files (llms-api.txt, llms-node-ops.txt)#802
crtahlin wants to merge 4 commits into
ethersphere:masterfrom
crtahlin:feat/llm-slice-files

Conversation

@crtahlin
Copy link
Copy Markdown
Collaborator

Summary

Adds two task-specific documentation bundles so AI coding agents can load only what's relevant instead of the full 630KB llms-full.txt:

  • llms-api.txt (220KB, 20 docs) — API usage, uploads, stamps, feeds, chunks, encryption, developer tooling
  • llms-node-ops.txt (225KB, 22 docs) — installation, configuration, monitoring, staking, backups, upgrades, FAQ

Uses docusaurus-plugin-llms customLLMFiles — glob patterns auto-include new pages, no manual maintenance.

Also adds discovery links in static/llms.txt and extends the validation script.

Refs: ethersphere/DevRel#840

Maintenance

  • Auto-maintained: slice patterns use directory globs, so new docs added under docs/develop/ or docs/bee/installation/ are automatically included
  • Validation: scripts/validate-llms-txt.mjs logs referenced slice files
  • No manual file curation needed — the plugin generates from patterns at build time

Test plan

  • npm run build succeeds — both files generated
  • llms-api.txt: 20 documents, correct header/rootContent
  • llms-node-ops.txt: 22 documents, correct header/rootContent
  • static/llms.txt references both slices
  • Validation script detects slice references

Add two task-specific documentation bundles via docusaurus-plugin-llms
customLLMFiles, so AI agents can load only the docs relevant to their task
instead of the full 630KB llms-full.txt:

- llms-api.txt (220KB, 20 docs): API usage, uploads, stamps, feeds, chunks,
  encryption, developer tooling
- llms-node-ops.txt (225KB, 22 docs): installation, configuration, monitoring,
  staking, backups, upgrades, FAQ

Both use glob patterns so new pages added under those directories are
automatically included — no manual maintenance needed.

Also adds slice file references to static/llms.txt for agent discovery,
and extends the validation script to log referenced slice files.

Refs: ethersphere/DevRel#840
@netlify
Copy link
Copy Markdown

netlify Bot commented May 13, 2026

Deploy Preview for test-twitter-preview-testing-3 ready!

Name Link
🔨 Latest commit 22ac0bd
🔍 Latest deploy log https://app.netlify.com/projects/test-twitter-preview-testing-3/deploys/6a0c8a6679911b0008fe9ed3
😎 Deploy Preview https://deploy-preview-802--test-twitter-preview-testing-3.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@crtahlin crtahlin requested a review from darkobas2 May 18, 2026 08:50
Copy link
Copy Markdown
Contributor

@darkobas2 darkobas2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea — the slices are exactly what agents need to skip the 630KB monolith. A few things before merge:

1. CLAUDE.md looks like a personal config leaking into the upstream repo

- **Always use the `crtahlin` fork/repo** for creating issues, branches, and all GitHub operations — never the upstream `ethersphere` repo.

This is committed to ethersphere/bee-docs (the upstream). If another contributor clones the repo and uses Claude Code, they'll get told to push to your fork. That's clearly not what you want for everyone else.

Two options:

  • Keep CLAUDE.md here but strip personal/workflow instructions — leave only the things that apply to every contributor (project overview, commands, architecture, conventions).
  • Move the personal bits to ~/.claude/CLAUDE.md (user-scoped, not committed) or to a .claude/ file you keep in your fork only.

Same goes for **Never mention Claude** — that's a defensible project rule, but you might want to phrase it as "no AI-attribution noise in commits/issues" since CLAUDE.md itself is literally mentioning Claude in the repo.

2. Two includePatterns don't match any file — silently dropped

I checked the actual docs/develop/ tree against the patterns:

  • docs/develop/act.md — doesn't exist. The file you almost certainly want is docs/develop/access-control.md (ACT == Access Control Trie).
  • docs/develop/gateway-proxy.md — doesn't exist as a top-level develop file. There's docs/develop/tools-and-features/gateway-proxy.md (which is already listed) and docs/develop/gateway.md (not listed — was that the intended addition?).

The plugin silently drops patterns that don't match, which is why your build still passes and the PR description's "20 docs" count comes out right — but you're losing access-control content.

3. Validation script doesn't catch the above

const sliceRe = /https:\/\/docs\.ethswarm\.org\/(llms-[a-z-]+\.txt)/g;

This only confirms that names referenced in llms.txt look like slice file names — it doesn't verify any includePatterns actually resolve to files. A typo in customLLMFiles (like act.md) is invisible until someone diff's the generated slice contents.

Worth adding a check that resolves the globs and warns on patterns matching zero files. Same idea as --fail-on-glob-mismatch in other tooling. Otherwise this regresses silently next time someone refactors filenames.

Minor

  • description and rootContent mostly duplicate each other; if both are emitted into the slice header that's fine, but worth confirming with a quick head on the generated files.
  • fullContent: true on both — is there a use case for the trimmed-content variant for an even smaller slice? Not blocking, just curious about the size/value tradeoff.

Architecture (customLLMFiles + auto-include via globs + discovery links in llms.txt) is solid. Fix items 1 and 2, ideally 3, and this is good to go.

Copy link
Copy Markdown
Contributor

@darkobas2 darkobas2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup — the slice idea is exactly what coding agents need. Three things to address before merge:

🔴 Two llms-api.txt paths don't exist in the repo

Cross-checked the includePatterns array against master:

Pattern Status
docs/develop/act.md ❌ does not exist (likely meant docs/develop/access-control.md)
docs/develop/gateway-proxy.md ❌ does not exist (the file is at docs/develop/tools-and-features/gateway-proxy.md, which is already listed — looks like an accidental duplicate at the wrong path)

The array has 22 entries but the PR description / build log shows "20 documents" — i.e. these two paths matched nothing and were silently dropped. The slice is missing ACT (access control) entirely, which is exactly the kind of thing a developer agent will be asked about.

Fix: replace docs/develop/act.md with docs/develop/access-control.md, and drop the duplicate docs/develop/gateway-proxy.md line.

🟡 CLAUDE.md is personal config, not upstream config

Two rules in the added file are author-specific and don't belong in ethersphere/bee-docs:

Always use the crtahlin fork/repo for creating issues, branches, and all GitHub operations — never the upstream ethersphere repo.

Never mention Claude in any commit messages, issue titles, issue bodies, branch names, or any other visible output. Do not reference AI assistance.

Both of these are workflow preferences for you working from your fork. If another contributor (or another agent) clones upstream and reads this CLAUDE.md, they'll be told to push to your fork — which is wrong. Suggest either:

  • Drop the CLAUDE.md from this PR entirely and keep it as a local-only file (gitignored), or
  • Keep an upstream CLAUDE.md but limit it to repo-neutral content (project overview, build commands, conventions) — strip the personal rules.

The "Project Overview / Architecture / Commands / Conventions" sections are useful and worth keeping if you split them.

🟡 validate-llms-txt.mjs change doesn't actually validate

The new block is labelled "Verify slice file references … point to files that will be generated" but the code just logs the references — it doesn't check the include patterns resolve to real markdown files. Had it actually globbed includePatterns against the filesystem, it would have caught the two missing paths above. Worth tightening, since the whole "auto-maintained via globs" story relies on these patterns not silently no-op-ing.

✅ Looks good

  • Glob-based docs/bee/installation/** for llms-node-ops.txt — all 11 install pages resolve.
  • All 12 working-with-bee/* paths in llms-node-ops.txt exist.
  • static/llms.txt discovery links are clean.
  • Splitting a 630KB blob into 220KB / 225KB task-specific bundles is the right move for context budgets.

- Remove CLAUDE.md from repo (personal config) and add to .gitignore
- Replace docs/develop/act.md with docs/develop/access-control.md
- Remove duplicate docs/develop/gateway-proxy.md
- Add includePatterns file-existence check to validate-llms-txt.mjs:
  globs checked via globSync, exact paths via readFileSync — would have
  caught both broken patterns before this review
@crtahlin
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review — all three items addressed in 22ac0bd:

  1. Broken paths — replaced docs/develop/act.md with docs/develop/access-control.md, removed duplicate docs/develop/gateway-proxy.md. Include patterns: 22 → 20 entries, all resolving.

  2. CLAUDE.md — removed from the PR entirely and added to .gitignore. Keeping it as a local-only file in my fork.

  3. Validation script — added step 4 to validate-llms-txt.mjs that checks every includePatterns entry against the filesystem (glob patterns via globSync, exact paths via readFileSync). Would have caught both broken paths before the review.

@crtahlin crtahlin requested a review from darkobas2 May 19, 2026 16:07
Copy link
Copy Markdown
Contributor

@darkobas2 darkobas2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed. All three prior blockers are addressed:

  • CLAUDE.md removed from commit; now in .gitignore
  • act.mdaccess-control.md and the gateway-proxy path fixed; verified all 22 API + 11 node-ops paths + docs/bee/installation/** resolve on the head commit ✓
  • Validation script now walks includePatterns and warns on missing files ✓

One small nit (won't block): the validator isn't wired into a package.json script or CI workflow, so the new pattern check won't actually run unless invoked manually. Worth a follow-up npm run validate-llms + a CI step.

LGTM, ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants